Implementation Plan: HPC-Specific CUI Compliance Roles
Branch: 004-hpc-cui-roles | Date: 2026-02-15 | Spec: spec.md
Input: Feature specification from /specs/004-hpc-cui-roles/spec.md
Summary
Implement five HPC-specific Ansible roles and two automation playbooks that integrate CUI compliance with research computing operations. The roles address Slurm job scheduling security (prolog/epilog scripts), container runtime restrictions (Apptainer), parallel filesystem ACLs and monitoring (Lustre/BeeGFS), node lifecycle management (PXE, sanitization), and interconnect security documentation (InfiniBand RDMA exceptions). Researcher onboarding/offboarding automation ties these together with FreeIPA, Duo, and Slurm account management.
Technical Context
Language/Version: Python 3.9+, Bash (POSIX-compliant for Slurm scripts) Primary Dependencies: Ansible 2.15+, Slurm 23.x+, Apptainer 1.2+, FreeIPA client, Lustre/BeeGFS client tools Storage: Lustre or BeeGFS parallel filesystem, local /tmp and /dev/shm for job scratch Testing: Molecule (Ansible role testing), pytest (Python scripts), Slurm test partition Target Platform: RHEL 9 / Rocky Linux 9 compute nodes, Slurm controller Project Type: Ansible collection (roles + playbooks + plugins) Performance Goals: Prolog overhead <30 seconds, epilog sanitization <60 seconds, ACL sync <5 minutes Constraints: Must integrate with existing Specs 001-003 infrastructure, FIPS 140-2 compliant Scale/Scope: 50-500 compute nodes, 10-50 concurrent CUI projects, 100-1000 researchers
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| Principle | Status | Evidence |
|---|---|---|
| I. Plain Language First | PASS | FR-014, FR-020, FR-042 require plain language READMEs and PI welcome packets |
| II. Data Model as Source of Truth | PASS | FR-046 updates hpc_tailoring.yml; roles consume existing control_mapping.yml |
| III. Compliance as Code | PASS | All roles implement verifiable controls with Ansible tags; FR-013 integrates with evidence collection |
| IV. HPC-Aware | PASS | Entire spec addresses HPC-specific tailoring; FR-029-031 document interconnect exceptions |
| V. Multi-Framework | PASS | Roles map to NIST 800-171 controls via Spec 001 data model |
| VI. Audience-Aware Documentation | PASS | Separate docs for researchers (FR-020), PIs (FR-042), admins (role READMEs) |
| VII. Idempotent and Auditable | PASS | All roles follow main.yml/verify.yml/evidence.yml pattern from constitution |
| VIII. Prefer Established Tools | PASS | Uses Slurm, Apptainer, FreeIPA, Lustre, nvidia-smi - no custom security tooling |
Gate Result: PASS - All 8 principles satisfied. Proceeding to Phase 0.
Project Structure
Documentation (this feature)
specs/004-hpc-cui-roles/
βββ plan.md # This file
βββ research.md # Phase 0 output
βββ data-model.md # Phase 1 output
βββ quickstart.md # Phase 1 output
βββ contracts/ # Phase 1 output
β βββ README.md # Internal contracts (no external APIs)
βββ tasks.md # Phase 2 output (/speckit.tasks command)
Source Code (repository root)
roles/
βββ hpc_slurm_cui/
β βββ tasks/
β β βββ main.yml # Configure Slurm partition, deploy prolog/epilog
β β βββ verify.yml # Verify partition config and script deployment
β β βββ evidence.yml # Collect job accounting evidence
β βββ templates/
β β βββ slurm_prolog.sh.j2 # Authorization check, audit logging
β β βββ slurm_epilog.sh.j2 # Memory scrub, GPU reset, health check
β β βββ cui_partition.conf.j2 # Slurm partition configuration
β βββ files/
β β βββ README_researchers.md # Plain language CUI partition guide
β βββ defaults/main.yml
β βββ vars/main.yml
β βββ meta/main.yml
β
βββ hpc_container_security/
β βββ tasks/
β β βββ main.yml # Configure Apptainer security
β β βββ verify.yml # Verify container restrictions
β β βββ evidence.yml # Collect container execution logs
β βββ templates/
β β βββ apptainer.conf.j2 # Security configuration
β β βββ container_wrapper.sh.j2 # Execution logging wrapper
β βββ files/
β β βββ README_containers.md # Researcher container guide
β βββ defaults/main.yml
β βββ vars/main.yml
β βββ meta/main.yml
β
βββ hpc_storage_security/
β βββ tasks/
β β βββ main.yml # Configure filesystem security
β β βββ verify.yml # Verify ACLs and quotas
β β βββ evidence.yml # Collect changelog evidence
β βββ templates/
β β βββ lustre_changelog.conf.j2 # Changelog monitoring config
β β βββ sanitize_project.sh.j2 # Data sanitization script
β βββ files/
β β βββ acl_sync.py # ACL-FreeIPA sync script
β βββ defaults/main.yml
β βββ vars/main.yml
β βββ meta/main.yml
β
βββ hpc_interconnect/
β βββ tasks/
β β βββ main.yml # Generate exception documentation
β β βββ verify.yml # Verify compensating controls
β β βββ evidence.yml # Collect boundary evidence
β βββ templates/
β β βββ rdma_exception.md.j2 # Exception document template
β β βββ compensating_controls.md.j2
β βββ defaults/main.yml
β βββ vars/main.yml
β βββ meta/main.yml
β
βββ hpc_node_lifecycle/
βββ tasks/
β βββ main.yml # Configure node lifecycle
β βββ verify.yml # Verify node compliance
β βββ evidence.yml # Collect node state evidence
βββ templates/
β βββ first_boot.sh.j2 # Post-PXE compliance scan
β βββ health_check.sh.j2 # Inter-job health check
β βββ sanitize_node.sh.j2 # NIST 800-88 sanitization
βββ defaults/main.yml
βββ vars/main.yml
βββ meta/main.yml
playbooks/
βββ onboard_project.yml # CUI project onboarding automation
βββ offboard_project.yml # CUI project offboarding automation
βββ vars/
βββ onboarding_defaults.yml
templates/
βββ pi_welcome_packet.md.j2 # Plain language PI instructions
docs/
βββ hpc_tailoring.yml # Updated with implementation details (FR-046)
βββ researcher_quickstart.md # Updated with HPC instructions (FR-047)
tests/
βββ integration/
β βββ test_slurm_prolog.py
β βββ test_container_security.py
β βββ test_storage_acls.py
βββ molecule/
βββ hpc_slurm_cui/
βββ hpc_container_security/
βββ hpc_storage_security/
βββ hpc_interconnect/
βββ hpc_node_lifecycle/
Structure Decision: Ansible collection structure with 5 HPC-specific roles following the existing pattern from Specs 001-002. Each role has main/verify/evidence task files per Constitution Principle VII. Playbooks for onboarding/offboarding are separate from roles to allow flexible composition.
Complexity Tracking
No violations to justify - all constitution principles satisfied.
| Aspect | Complexity Level | Justification |
|---|---|---|
| Role count | 5 new roles | Each maps to distinct HPC subsystem; cannot be combined without violating separation of concerns |
| Script languages | Bash + Python | Bash required for Slurm prolog/epilog; Python for complex ACL sync logic |
| Filesystem support | Lustre + BeeGFS | Spec requires both; abstraction layer handles differences |